probability estimation and adversarial robustness
Reviews: Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
Summary: The region of uncertainty (prediction probability close to 0.5) for softmax of logits is extremely small near an M-1 dimensional hyperplane in the logits space. The reason is changing one of the logits for one of the classes affects the probability vectors in all dimensions. The authors show that, if each logit is first converted to an independent probability using 1/(1 exp(-x)) function and the probability vector correlated with each codeword of an error correcting in a soft way to decode, this method has a large volume of uncertainty. The volume of uncertainty is larger when the min hamming distance of the code is large. This because multiple logits must be changed at the same time to cause a wrong decoding.
Reviews: Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
This paper proposes the use of error correcting codes as class representations to improve robustness for adversarial attacks. The main idea of error correcting output codes is well-known, but this is the paper that shows that such ideas can be used for adversarial robustness. The paper shows very promising results especially in the rebuttal for CIFAR10. The distance bound is equivalent to Plotkin as the reviewer pointed out so this should be fixed in the paper.
Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
Modern machine learning systems are susceptible to adversarial examples; inputs which clearly preserve the characteristic semantics of a given class, but whose classification is (usually confidently) incorrect. Existing approaches to adversarial defense generally rely on modifying the input, e.g. However, recent research has shown that most such approaches succumb to adversarial examples when different norms or more sophisticated adaptive attacks are considered. In this paper, we propose a fundamentally different approach which instead changes the way the output is represented and decoded. This simple approach achieves state-of-the-art robustness to adversarial examples for L 2 and L based adversarial perturbations on MNIST and CIFAR10.
Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks
Verma, Gunjan, Swami, Ananthram
Modern machine learning systems are susceptible to adversarial examples; inputs which clearly preserve the characteristic semantics of a given class, but whose classification is (usually confidently) incorrect. Existing approaches to adversarial defense generally rely on modifying the input, e.g. However, recent research has shown that most such approaches succumb to adversarial examples when different norms or more sophisticated adaptive attacks are considered. In this paper, we propose a fundamentally different approach which instead changes the way the output is represented and decoded. This simple approach achieves state-of-the-art robustness to adversarial examples for L 2 and L based adversarial perturbations on MNIST and CIFAR10. In addition, even under strong white-box attacks, we find that our model often assigns adversarial examples a low probability; those with high probability are usually interpretable, i.e. perturbed towards the perceptual boundary between the original and adversarial class.